Tracking the Intangible: Quantifying Effort in NFL Running Backs

Authors

Emily Shteynberg

Luke Snavely

Sheryl Solorzano

Last updated

July 25, 2025

Image source: The Tower


Introduction

Describe the problem and why it is important.

American Football is one of the most-watched and popular sports in the U.S., known for its quick decision-making, complex tactics, and athletically demanding displays of strength, endurance and speed.

Data

  • Describe the data you’re using in detail, where you accessed it, along with relevant exploratory data analysis (EDA). You should also include descriptions of any relevant data pre-processing steps (e.g., whether you consider specific observations, create any meaningful features, etc.—but don’t mention minor steps like column type conversion, filtering out unnecessary rows)

  • The data used for this project were from NFL Big Data Bowl 2022 Dataset (NFL Big Data Bowl 2022) on Kaggle.

  • We limited our dataset to NFL running backs with more than 20 rushes.

Methods

  • Describe the modeling techniques you chose, their assumptions, justifications for why they are appropriate for the problem, and how you’re comparing/evaluating the different methods.

  • Used Dr. Ron Yurko and Quang Nguyen’s code to calculate distance from the nearest defender (Nguyen 2023)

  • Based our AS/ AKE curved on the article titled “Individual acceleration-speed profile in-situ: A proof of concept in professional football players”(Morin et al. 2021)

  • Still using the non-linear quantile regression plot? (Ding 2024)

Results

Describe your results. This can include tables and plots showing your results, as well as text describing how your models worked and the appropriate interpretations of the relevant output. (Note: Don’t just write out the textbook interpretations of all model coefficients. Instead, interpret the output that is relevant for your question of interest that is framed in the introduction)

Discussion

Give your conclusions and summarize what you have learned with regards to your question of interest. Are there any limitations with the approaches you used? What do you think are the next steps to follow-up your project?

Appendix

Non-linear quantile regression for acceleration vs speed

Appendix: A quick tutorial

(Feel free to remove this section when you submit)

This a Quarto document. To learn more about Quarto see https://quarto.org. You can use the Render button to see what it looks like in HTML.

Text formatting

Text can be bolded with double asterisks and italicized with single asterisks. Monospace text, such as for short code snippets, uses backticks. (Note these are different from quotation marks or apostrophes.) Links are written like this.

Bulleted lists can be written with asterisks:

  • Each item starts on a new line with an asterisk.
  • Items should start on the beginning of the line.
  • Leave blank lines after the end of the list so the list does not continue.

Mathematics can be written with LaTeX syntax using dollar signs. For instance, using single dollar signs we can write inline math: (-b \pm \sqrt{b^2 - 4ac})/2a.

To write math in “display style”, i.e. displayed on its own line centered on the page, we use double dollar signs: x^2 + y^2 = 1

Code blocks

Code blocks are evaluated sequentially when you hit Render. As the code runs, R prints out which block is running, so naming blocks is useful if you want to know which one takes a long time. After the block name, you can specify chunk options. For example, echo controls whether the code is printed in the document. By default, output is printed in the document in monospace:

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Chunk options can also be written inside the code block, which is helpful for really long options, as we’ll see soon.

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Figures

If a code block produces a plot or figure, this figure will automatically be inserted inline in the report. That is, it will be inserted exactly where the code block is.

This is a caption. It should explain what’s in the figure and what’s interesting about it. For instance: There is a negative, strong linear correlation between miles per gallon and horsepower for US cars in the 1970s.

Notice the use of fig-width and fig-height to control the figure’s size (in inches). These control the sizes given to R when it generates the plot, so R proportionally adjusts the font sizes to be large enough.

Tables

Use the knitr::kable() function to print tables as HTML:

mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

We can summarize model results with a table. For instance, suppose we fit a linear regression model:

model1 <- lm(mpg ~ disp + hp + drat, data = mtcars)

It is not appropriate to simply print summary(model1) into the report. If we want the reader to understand what models we have fit and what their results are, we should provide a nicely formatted table. A simple option is to use the tidy() function from the broom package to get a data frame of the model fit, and simply report that as a table.

Predicting fuel economy using vehicle features.
Term Estimate SE t p
(Intercept) 19.34 6.37 3.04 0.01
disp -0.02 0.01 -2.05 0.05
hp -0.03 0.01 -2.34 0.03
drat 2.71 1.49 1.83 0.08

References

Ding, Peng. 2024. Linear Models and Extensions. Chapman & Hall. https://arxiv.org/pdf/2401.00649.
Morin, Jean-Benoit, Yann Le Mat, Cristian Osgnach, Andrea Barnabò, Alessandro Pilati, Pierre Samozino, and Pietro E. di Prampero. 2021. “Individual Acceleration-Speed Profile in-Situ: A Proof of Concept in Professional Football Players.” Journal of Biomechanics 123: 110524. https://doi.org/https://doi.org/10.1016/j.jbiomech.2021.110524.
NFL Big Data Bowl. 2022. “NFL Big Data Bowl 2022 Dataset.” Kaggle. https://www.kaggle.com/competitions/nfl-big-data-bowl-2022/data.
Nguyen, Quang. 2023. “Turn-Angle.” https://github.com/qntkhvn/turn-angle/blob/main/scripts/01a_prep_rusher_data.R.